The taraXÜ corpus of human-annotated machine translations
نویسندگان
چکیده
Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.
منابع مشابه
SubCo: A Learner Translation Corpus of Human and Machine Subtitles
In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resour...
متن کاملCompiling and Using a Shareable Parallel Corpus for Machine Translation Evaluation
TECMATE is a dynamic TEchnical Corpus for MAchine Translation Evaluation currently being compiled and used at the University of Leeds. A purpose-built corpus for machine translation (MT) evaluation differs in terms of size and content from corpora used for other kinds of linguistic analysis. For example, our research in automated MT evaluation requires source texts with human and machine transl...
متن کاملA Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
In recent years, machine translation (MT) research has focused on investigating how hybrid machine translation as well as system combination approaches can be designed so that the resulting hybrid translations show an improvement over the individual “component” translations. As a first step towards achieving this objective we have developed a parallel corpus with source text and the correspondi...
متن کاملTowards Optimal Choice Selection for Improved Hybrid Machine Translation
In recent years, machine translation (MT) research focused on investigating how hybridMT as well as MT combination systems can be designed so that the resulting translations give an improvement over the individual translations. As a first step towards achieving this objective we have developed a parallel corpus with source data and the output of a number of MT systems, annotated with metadata i...
متن کاملBuilding an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation
We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated ...
متن کامل